Multiclass Classification with Imbalanced Datasets for Car Ownership Demand Model – Cost-Sensitive Learning

نویسندگان

چکیده

In terms of the travel demand prediction from household car ownership model, if imbalanced data were used to support transportation policy via a machine learning it would negatively affect algorithm training process. The on obtained study project for expressway preparation in Khon Kaen Province (2015) was an unbalanced dataset. other words, number members minority class is lower than rest answer classes. result bias classification. Consequently, this research suggested balancing datasets with cost-sensitive methods, including decision trees, k-nearest neighbors (kNN), and naive Bayes algorithms. Before creating 3-class k-folds cross-validation method applied classify define true positive rate (TPR) model’s performance validation. outcome indicated that kNN demonstrated best compared It provides TPR rural suburban area types, which are region types very different imbalance ratios, before 46.9% 46.4%. After (MCN1), values 84.4% 81.4%, respectively.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

and Applied Analysis 3 costs for the positive and negative classes, SVM can be extended to the cost-sensitive setting by introducing an additional parameter that penalizes the errors asymmetrically. Consider that we have a binary classification problem, which is represented by a data set {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x l , y l )}, where x i ⊂ R represents a k-dimensional data point and ...

متن کامل

Cost-sensitive Multiclass Classification Risk Bounds

A commonly used approach to multiclass classification is to replace the 0− 1 loss with a convex surrogate so as to make empirical risk minimization computationally tractable. Previous work has uncovered sufficient and necessary conditions for the consistency of the resulting procedures. In this paper, we strengthen these results by showing how the 0− 1 excess loss of a predictor can be upper bo...

متن کامل

Classification in Imbalanced Datasets

In this thesis we study the classification task in the presence of class imbalanced data. This task arises in many applications when we are interested in the under-represented (minority) classes. Examples of such applications are related to fraud detection, medical diagnosis and monitoring, text categorization, risk management, information retrieval and filtering. Although there exist many stan...

متن کامل

Cost-Aware Pre-Training for Multiclass Cost-Sensitive Deep Learning

Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed. Many such applications also demand varying costs for different types of mis-classification errors, but it is not clear whether or how such cost information can be incorporated into deep learning to improv...

متن کامل

Cost-sensitive decision tree ensembles for effective imbalanced classification

Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on over...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Promet-traffic & Transportation

سال: 2021

ISSN: ['1848-4069', '0353-5320']

DOI: https://doi.org/10.7307/ptt.v33i3.3728